AITopics | sample selection

Collaborating Authors

sample selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ASet of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

Neural Information Processing SystemsJun-23-2026, 03:53:09 GMT

Poison-only Clean-label Backdoor Attacks (PCBAs) aim to covertly inject attackerdesired behavior into DNNs by merely poisoning the dataset without changing the labels. To effectively implant a backdoor, multiple triggers are proposed for various attack requirements of Attack Success Rate (ASR) and stealthiness. Additionally, sample selection enhances clean-label backdoor attacks' ASR by meticulously selecting "hard" samples instead of random samples to poison. Current methods, however, 1) usually handle the sample selection and triggers in isolation, leading to limited performance on both ASR and stealthiness when converted to PCBAs. Therefore, we seek to explore the bi-directional collaborative relations between the sample selection and triggers to address the above dilemma.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > China > Guangdong Province (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(3 more...)

Add feedback

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Neural Information Processing SystemsJun-21-2026, 16:20:00 GMT

We introduce ThinkLite-VL, a family of visual reasoning models that achieve state-of-the-art (SoTA) performance using an order of magnitude fewer training samples, relying purely on reinforcement fine-tuning (RFT) self-improvement without any knowledge distillation. Our central insight is that sample difficulty critically influences RFT effectiveness: appropriately challenging examples can drive substantial reasoning improvements, even in low-data regimes. However, quantifying sample difficulty in a reliable and scalable manner remains non-trivial. To address this, we repurpose Monte Carlo Tree Search (MCTS) to measure sample difficulty via the number of reasoning iterations a vision-language model (VLM) requires to solve each instance. This MCTS-based selection procedure identifies samples that induce deeper reasoning while remaining solvable, allowing us to filter a high-quality subset from 70k open-source examples spanning math, natural image understanding, and chart comprehension. Using this approach, we select just 11k challenging samples for RFT on Qwen2.5-VL-7B-Instruct and 7.5k samples for Qwen2.5-VL-72B-Instruct. The resulting models, ThinkLite-VL-7B and ThinkLiteVL-72B, significantly outperform their respective base models across eight visual reasoning benchmarks.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
(2 more...)

Add feedback

Sample Selection for Fair and Robust Training

Neural Information Processing SystemsApr-24-2026, 12:47:38 GMT

Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NP-hard, we propose a greedy algorithm that is efficient and effective in practice. Experiments show that our algorithm obtains fairness and robustness that are better than or comparable to the state-of-the-art technique, both on synthetic and benchmark real datasets. Moreover, unlike other fair and robust training baselines, our algorithm can be used by only modifying the sampling step in batch selection without changing the training algorithm or leveraging additional clean data.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Boundary Matters: A Bi-Level Active Finetuning Method

Neural Information Processing SystemsMar-19-2026, 19:36:50 GMT

The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields. However, the finetuning phase still requires high-quality annotated samples. To overcome this challenge, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Boundary Matters: A Bi-Level Active Finetuning Method Han Lu

Neural Information Processing SystemsFeb-11-2026, 18:06:08 GMT

The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields. However, the finetuning phase still requires high-quality annotated samples.

machine learning, natural language, selection, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

e97a4f04ef1b914f6a1698caa364f693-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 17:16:02 GMT

covid-19, infection rate, risk factor, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
Europe (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.41)

Add feedback

Title

Author

Neural Information Processing SystemsFeb-10-2026, 23:11:08 GMT

learning, neural network, noisy label, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report (0.47)

Industry:

Health & Medicine (0.93)
Education > Educational Setting (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Add feedback

EstimatingNoiseTransitionMatrixwithLabel CorrelationsforNoisyMulti-LabelLearning

Neural Information Processing SystemsFeb-10-2026, 23:02:09 GMT

Inlabel-noiselearning,thenoise transitionmatrix,bridgingtheclassposteriorfor noisy and clean data, has been widely exploited to learnstatistically consistent classifiers.

artificial intelligence, machine learning, transition matrix, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AppendixforTask-FreeContinualLearningVia OnlineDiscrepancyDistanceLearning

Neural Information Processing SystemsFeb-10-2026, 21:31:28 GMT

Theorem1.Let Pi represent the distribution of all seen training samples (including all previous Agoodtrade-offbetween themodel'scomplexityandgeneralization performance, observedfrom Eq. (12), is allowing each component to learn the underlying data distribution of a unique target set. By satisfying the ideal selection process (Eq.(22) of the paper) and also consideringthateachcomponent Gtfinishedthetrainingon Mkt atTkt,weassumethatthedynamic 4 expansion modelG can be seen as a single modelh trained on all previously learnt memories Maximal Interfered Retrieval (MIR), [1] is one of 5 themostpopular memory-based approaches, whichusesamemory bufferwithasample selection criterion. Since Pi would involve several underlying data distributions as the number of training steps (i) increases, the diversity in the memory plays an important role to ensure a tight GB in Eq.(15). G be single model which consists of a classifierh HandaVAEmodelv. M be a memory buffer updated at the training stepTi. Figure 1: The learning process of the proposed ODDL-S, which consists of three phases.

artificial intelligence, machine learning, ofthepaper, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback

Filters

Collaborating Authors

sample selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ASet of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Sample Selection for Fair and Robust Training

Boundary Matters: A Bi-Level Active Finetuning Method

Boundary Matters: A Bi-Level Active Finetuning Method Han Lu

e97a4f04ef1b914f6a1698caa364f693-Paper.pdf

f23d125da1e29e34c552f448610ff25f-AuthorFeedback.pdf

Title

EstimatingNoiseTransitionMatrixwithLabel CorrelationsforNoisyMulti-LabelLearning

AppendixforTask-FreeContinualLearningVia OnlineDiscrepancyDistanceLearning